Non-record: Does SLOT violate causal dependence? (empirical test + question)#1240
Open
andrewbaggio1 wants to merge 2 commits intoopenai:mainfrom
Open
Non-record: Does SLOT violate causal dependence? (empirical test + question)#1240andrewbaggio1 wants to merge 2 commits intoopenai:mainfrom
andrewbaggio1 wants to merge 2 commits intoopenai:mainfrom
Conversation
Combines Full Hessian GPTQ, legal score-first chunked TTT (3 epochs), and SLOT delta optimization (8 AdamW steps per batch). All eval-time techniques are single-pass, score-before-update compliant. 3-seed mean: 1.1064 +/- 0.0004 BPB on 8xH100 SXM. Beats verified SOTA (openai#1019, 1.1147) by 0.0083 BPB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Empirical test showing SLOT violates causal dependence (Issue openai#1017 Rule 1). Includes reproducible proof script, output log, and full writeup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
I wrote a script with a coding agent that tests whether SLOT violates causal
dependence. I think the answer is yes.
My Test
we flipped.
Without SLOT, changing a target has no effect on other positions because the
model is causal. With SLOT, every single scored position is affected, and I
found a 100% violation rate across 240 tested pairs.
I also checked self-prediction: is P(x_{t+1}) better when x_{t+1} is actually
in the optimization targets vs when it's swapped for a random token? Yes, by
+0.24 nats (shared delta) and +0.73 nats (per-sample + logit bias).
I'm flagging my own submission (PR #1209).
I looked at PR #1229, which pushes SLOT to its logical extreme: per-sample
delta, per-sample logit bias, and scored-position masking. It gets 0.9300 BPB,
which is the best on the non-verified leaderboard and 0.19 below merged SOTA.
In defense of it, the mechanism is the same as shared-delta SLOT, just with
more parameters to memorize the evaluation targets.
Counterargument
@AnubhavBharadwaaj correctly points out that in stride=64 sliding window,
1984/2048 tokens are already-scored context. So ~97% of the gradient comes
from known tokens. I think this is fair because the leakage in shared-delta
SLOT is small. But it's not zero, and "a little bit of future information"
is still future information.
Reproducing
# No GPU needed. ~30 seconds on CPU. python prove_slot_causal_violation.pyRequest for a decision
@0hq @valerio-oai SLOT has been debated across PRs #1084, #1128, #1172,
#1176, #1209 without a ruling. I'd really appreciate you weighing in!
Full writeup in the README generated by Claude Code.